13 research outputs found
Multilingual Neural Machine Translation System for Indic to Indic Languages
This paper gives an Indic-to-Indic (IL-IL) MNMT baseline model for 11 ILs
implemented on the Samanantar corpus and analyzed on the Flores-200 corpus. All
the models are evaluated using the BLEU score. In addition, the languages are
classified under three groups namely East Indo- Aryan (EI), Dravidian (DR), and
West Indo-Aryan (WI). The effect of language relatedness on MNMT model
efficiency is studied. Owing to the presence of large corpora from English (EN)
to ILs, MNMT IL-IL models using EN as a pivot are also built and examined. To
achieve this, English- Indic (EN-IL) models are also developed, with and
without the usage of related languages. Results reveal that using related
languages is beneficial for the WI group only, while it is detrimental for the
EI group and shows an inconclusive effect on the DR group, but it is useful for
EN-IL models. Thus, related language groups are used to develop pivot MNMT
models. Furthermore, the IL corpora are transliterated from the corresponding
scripts to a modified ITRANS script, and the best MNMT models from the previous
approaches are built on the transliterated corpus. It is observed that the
usage of pivot models greatly improves MNMT baselines with AS-TA achieving the
minimum BLEU score and PA-HI achieving the maximum score. Among languages, AS,
ML, and TA achieve the lowest BLEU score, whereas HI, PA, and GU perform the
best. Transliteration also helps the models with few exceptions. The best
increment of scores is observed in ML, TA, and BN and the worst average
increment is observed in KN, HI, and PA, across all languages. The best model
obtained is the PA-HI language pair trained on PAWI transliterated corpus which
gives 24.29 BLEU.Comment: 38 pages, 2 figure
Optimization Matrix Factorization Recommendation Algorithm Based on Rating Centrality
Matrix factorization (MF) is extensively used to mine the user preference
from explicit ratings in recommender systems. However, the reliability of
explicit ratings is not always consistent, because many factors may affect the
user's final evaluation on an item, including commercial advertising and a
friend's recommendation. Therefore, mining the reliable ratings of user is
critical to further improve the performance of the recommender system. In this
work, we analyze the deviation degree of each rating in overall rating
distribution of user and item, and propose the notion of user-based rating
centrality and item-based rating centrality, respectively. Moreover, based on
the rating centrality, we measure the reliability of each user rating and
provide an optimized matrix factorization recommendation algorithm.
Experimental results on two popular recommendation datasets reveal that our
method gets better performance compared with other matrix factorization
recommendation algorithms, especially on sparse datasets
P.: A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recognition 44(12
a b s t r a c t Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of Oðn 2 Þ, where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets
A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data
Collaborative filtering (CF) is the most successful approach for personalized product or service recommendations. Neighborhood based collaborative filtering is an important class of CF, which is simple, intuitive and efficient product recommender system widely used in commercial domain. Typically, neighborhood-based CF uses a similarity measure for finding similar users to an active user or similar products on which she rated. Traditional similarity measures utilize ratings of only co-rated items while computing similarity between a pair of users. Therefore, these measures are not suitable in a sparse data. In this paper, we propose a similarity measure for neighborhood based CF, which uses all ratings made by a pair of users. Proposed measure finds importance of each pair of rated items by exploiting Bhattacharyya similarity. To show effectiveness of the measure, we compared performances of neighborhood based CFs using state-of-the-art similarity measures with the proposed measured based CF. Recommendation results on a set of real data show that proposed measure based CF outperforms existing measures based CFs in various evaluation metrics
Hidden location prediction using check-in patterns in location-based social networks
This is a post-peer-review, pre-copyedit version of an article published in Knowledge and Information Systems. The final authenticated version is available online at: https://doi.org/10.1007/s10115-018-1170-5Check-in facility in a Location Based Social Network (LBSN) enables people to share location information as well as real life activities. Analysing these historical series of check-ins to predict the future locations to be visited has been very popular in the research community. However, it has been found that people do not intend to share the privately visited locations and activities in a LBSN.
Research into extrapolating unchecked locations from historical data is limited.
Knowledge of hidden locations can have a wide range of benefits to society. It may help the investigating agencies in identifying possible places visited by a suspect, a marketing company in selecting potential customers for targeted marketing, for medical representatives in identifying areas for disease prevention and containment, etc. In this paper, we propose an Associative Location Prediction Model (ALPM), which infers privately visited unchecked locations from a published user
trajectory. The proposed ALPM explores the association between a user's checked-in data, the Hidden Markov Model and proximal locations around a published check-in for predicting the unchecked or hidden locations. We evaluate ALPM on real-world Gowalla LBSN dataset for the users residing in Beijing, China. Experimental results show that the proposed model outperforms the existing state of the
art work in literature